Sparse Randomized Partition Trees for Nearest Neighbor Search

نویسندگان

  • Kaushik Sinha
  • Omid Keivani
چکیده

Randomized partition trees have recently been shown to be very effective in solving nearest neighbor search problem. In spite of enjoying strong theoretical guarantee, it suffers from high space complexity, since each internal node of the tree needs to store a d dimensional projection direction leading to aO(nd) space complexity for a dataset of size n. Inspired by the fast Johnson-Lindenstrauss transform, in this paper, we propose a sparse version of randomized partition tree where each internal node needs to store only a few non-zero entries, as opposed to all d entries, leading to significant space savings without sacrificing much in terms of nearest neighbor search accuracy. As a by product of this, query time of our proposed method is slightly better than that of its non-sparse counterpart for large dataset size. Our theoretical results indicate that our proposed method enjoys the same theoretical guarantee as that of the original non-sparse RP-tree. Experimental evaluations on four real world dataset strongly suggest that nearest neighbor search performance of our proposed sparse RP-tree is very similar to that of its non-sparse counterpart in terms of accuracy and number of retrieved points.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast £1-norm Nearest Neighbor Search Using A Simple Variant of Randomized Partition Tree

For big data applications, randomized partition trees have recently been shown to be very effective in answering high dimensional nearest neighbor search queries with provable guarantee, when distances are measured using 2 norm. Unfortunately, if distances are measured using 1 norm, the same theoretical guarantee does not hold. In this paper, we show that a simple variant of randomized partitio...

متن کامل

Randomized partition trees for exact nearest neighbor search

The k-d tree was one of the first spatial data structures proposed for nearest neighbor search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization and overlapping cells, have proved to be successful in practice. We analyze three such schemes. We show that the probability that they fail to find the nearest neighbor, for any data set and any query poin...

متن کامل

Which Spatial Partition Trees are Adaptive to Intrinsic Dimension?

Recent theory work has found that a special type of spatial partition tree – called a random projection tree – is adaptive to the intrinsic dimension of the data from which it is built. Here we examine this same question, with a combination of theory and experiments, for a broader class of trees that includes k-d trees, dyadic trees, and PCA trees. Our motivation is to get a feel for (i) the ki...

متن کامل

Consequences of Common Topological Rearrangements for Partition Trees in Phylogenomic Inference

In phylogenomic analysis the collection of trees with identical score (maximum likelihood or parsimony score) may hamper tree search algorithms. Such collections are coined phylogenetic terraces. For sparse supermatrices with a lot of missing data, the number of terraces and the number of trees on the terraces can be very large. If terraces are not taken into account, a lot of computation time ...

متن کامل

Estimation of Density using Plotless Density Estimator Criteria in Arasbaran Forest

    Sampling methods have a theoretical basis and should be operational in different forests; therefore selecting an appropriate sampling method is effective for accurate estimation of forest characteristics. The purpose of this study was to estimate the stand density (number per hectare) in Arasbaran forest using a variety of the plotless density estimators of the nearest neighbors sampling me...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017